25 research outputs found
Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration
Testing in Continuous Integration (CI) involves test case prioritization,
selection, and execution at each cycle. Selecting the most promising test cases
to detect bugs is hard if there are uncertainties on the impact of committed
code changes or, if traceability links between code and tests are not
available. This paper introduces Retecs, a new method for automatically
learning test case selection and prioritization in CI with the goal to minimize
the round-trip time between code commits and developer feedback on failed test
cases. The Retecs method uses reinforcement learning to select and prioritize
test cases according to their duration, previous last execution and failure
history. In a constantly changing environment, where new test cases are created
and obsolete test cases are deleted, the Retecs method learns to prioritize
error-prone test cases higher under guidance of a reward function and by
observing previous CI cycles. By applying Retecs on data extracted from three
industrial case studies, we show for the first time that reinforcement learning
enables fruitful automatic adaptive test case selection and prioritization in
CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017).
Reinforcement Learning for Automatic Test Case Prioritization and Selection
in Continuous Integration. In Proceedings of 26th International Symposium on
Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC
Industry–Academia Research Collaboration and Knowledge Co-creation: Patterns and Anti-patterns
Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry–academia collaborations. This article reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research–practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods, we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry–academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.publishedVersio
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models
Causal Neural Network models have shown high levels of robustness to
adversarial attacks as well as an increased capacity for generalisation tasks
such as few-shot learning and rare-context classification compared to
traditional Neural Networks. This robustness is argued to stem from the
disentanglement of causal and confounder input signals. However, no
quantitative study has yet measured the level of disentanglement achieved by
these types of causal models or assessed how this relates to their adversarial
robustness.
Existing causal disentanglement metrics are not applicable to deterministic
models trained on real-world datasets. We, therefore, utilise metrics of
content/style disentanglement from the field of Computer Vision to measure
different aspects of the causal disentanglement for four state-of-the-art
causal Neural Network models. By re-implementing these models with a common
ResNet18 architecture we are able to fairly measure their adversarial
robustness on three standard image classification benchmarking datasets under
seven common white-box attacks. We find a strong association (r=0.820, p=0.001)
between the degree to which models decorrelate causal and confounder signals
and their adversarial robustness. Additionally, we find a moderate negative
association between the pixel-level information content of the confounder
signal and adversarial robustness (r=-0.597, p=0.040).Comment: 12 pages, 3 figure
Optimal Minimisation of Pairwise-covering Test Configurations Using Constraint Programming
International audienceContext: Testing highly-configurable software systems is challenging due to a large number of test configurations that have to be carefully selected in order to reduce the testing effort as much as possible, while maintaining high software quality. Finding the smallest set of valid test configurations that ensure sufficient coverage of the system's feature interactions is thus the objective of validation engineers, especially when the execution of test configurations is costly or time-consuming. However, this problem is NP-hard in general and approximation algorithms have often been used to address it in practice. Objective: In this paper, we explore an alternative approach based on constraint programming that will allow engineers to increase the effectiveness of configuration testing while keeping the number of configurations as low as possible. Method: Our approach consists in using a (time-aware) minimisation algorithm based on constraint programming. Given the amount of time, our solution generates a minimised set of valid test configurations that ensure coverage of all pairs of feature values (a.k.a. pairwise coverage). The approach has been implemented in a tool called PACOGEN. Results: PACOGEN was evaluated on 224 feature models in comparison with the two existing tools that are based on a greedy algorithm. For 79% of 224 feature models, PACOGEN generated up to 60% fewer test configurations than the competitor tools. We further evaluated PACOGEN in the case study of large industrial highly-configurable video conferencing software with a feature model of 169 features, and found 60% fewer configurations compared with the manual approach followed by test engineers. The set of test configurations generated by PACOGEN decreased the time required by test engineers in manual test configuration by 85%, increasing the feature-pairs coverage at the same time. Conclusion: Extensive evaluation concluded that optimal minimisation of pairwise-covering test configurations is efficiently addressed using constraint programming techniques
Practical Pairwise Testing for Software Product Lines
International audienceOne key challenge for software product lines is efficiently managing variability throughout their lifecycle. In this paper, we address the problem of variability in software product lines testing. We (1) identify a set of issues that must be addressed to make software product line testing work in practice and (2) provide a framework that combines a set of techniques to solve these issues. The framework integrates feature modelling, combinatorial interaction testing and constraint programming techniques. First, we extract variability in a software product line as a feature model with specified feature interdependencies. We then employ an algorithm that generates a minimal set of valid test cases covering all 2-way feature interactions for a given time interval. Furthermore, we evaluate the framework on an industrial SPL and show that using the framework saves time and provides better test coverage. In particular, our experiments show that the framework improves industrial testing practice in terms of (i) 17% smaller set of test cases that are (a) valid and (b) guarantee all 2-way feature coverage (as opposite to 19.2% 2-way feature coverage in the hand made test set), and (ii) full flexibility and adjustment of test generation to available testing time
Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning
In maritime traffic surveillance, detecting illegal activities, such as
illegal fishing or transshipment of illicit products is a crucial task of the
coastal administration. In the open sea, one has to rely on Automatic
Identification System (AIS) message transmitted by on-board transponders, which
are captured by surveillance satellites. However, insincere vessels often
intentionally shut down their AIS transponders to hide illegal activities. In
the open sea, it is very challenging to differentiate intentional AIS shutdowns
from missing reception due to protocol limitations, bad weather conditions or
restricting satellite positions. This paper presents a novel approach for the
detection of abnormal AIS missing reception based on self-supervised deep
learning techniques and transformer models. Using historical data, the trained
model predicts if a message should be received in the upcoming minute or not.
Afterwards, the model reports on detected anomalies by comparing the prediction
with what actually happens. Our method can process AIS messages in real-time,
in particular, more than 500 Millions AIS messages per month, corresponding to
the trajectories of more than 60 000 ships. The method is evaluated on 1-year
of real-world data coming from four Norwegian surveillance satellites. Using
related research results, we validated our method by rediscovering already
detected intentional AIS shutdowns.Comment: IEEE Transactions on Intelligent Transportation System
Opening the Software Engineering Toolbox for the Assessment of Trustworthy AI
Trustworthiness is a central requirement for the acceptance and success of human-centered artificial intelligence (AI). To deem an AI system as trustworthy, it is crucial to assess its behaviour and characteristics against a gold standard of Trustworthy AI, consisting of guidelines, requirements, or only expectations. While AI systems are highly complex, their implementations are still based on software. The software engineering community has a long established toolbox for the assessment of software systems, especially in the context of software testing. In this paper, we argue for the application of software engineering and testing practices for the assessment of trustworthy AI. We make the connection between the seven key requirements as defined by the European Commission’s AI high-level expert group and established procedures from software engineering and raise questions for future work.publishedVersio
Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies
Autonomous cars are well known for being vulnerable to adversarial attacks
that can compromise the safety of the car and pose danger to other road users.
To effectively defend against adversaries, it is required to not only test
autonomous cars for finding driving errors, but to improve the robustness of
the cars to these errors. To this end, in this paper, we propose a two-step
methodology for autonomous cars that consists of (i) finding failure states in
autonomous cars by training the adversarial driving agent, and (ii) improving
the robustness of autonomous cars by retraining them with effective adversarial
inputs. Our methodology supports testing ACs in a multi-agent environment,
where we train and compare adversarial car policy on two custom reward
functions to test the driving control decision of autonomous cars. We run
experiments in a vision-based high fidelity urban driving simulated
environment. Our results show that adversarial testing can be used for finding
erroneous autonomous driving behavior, followed by adversarial training for
improving the robustness of deep reinforcement learning based autonomous
driving policies. We demonstrate that the autonomous cars retrained using the
effective adversarial inputs noticeably increase the performance of their
driving policies in terms of reduced collision and offroad steering errors
Software Testing for Machine Learning
Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. Software verification and testing are established technique for assuring such properties, for example by detecting errors. However, software testing challenges for machine learning are vast and profuse - yet critical to address. This summary talk discusses the current state-of-the-art of software testing for machine learning. More specifically, it discusses six key challenge areas for software testing of machine learning systems, examines current approaches to these challenges and highlights their limitations. The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning